TECHNIQUES FOR SOLVING SUDOKU PUZZLES 

ERIC C. CHI* AND KENNETH LANGE* t 

Abstract. Solving Sudoku puzzles is one of the most popular pastimes in the world. Puzzles 

range in difficulty from easy to very challcnging; the hardest puzzles tend to hâve the most empty cells. 

The current paper explains and compares three algorithms for solving Sudoku puzzles. Backtracking, 

simulated annealing, and altcrnating projections are generic methods for attacking combinatorial 

optimization problems. Our results favor backtracking. It infallibly solvcs a Sudoku puzzle or 

deduces that a unique solution does not exist. However, backtracking does not scale well in high- 

£N| dimensional combinatorial optimization. Hence, it is useful to expose students in the mathematical 

— < sciences to the other two solution techniques in a concrète setting. Simulated annealing shares a 

common structure with MCMC (Markov chain Monte Carlo) and enjoys wide applicability. The 

^s^ method of alternating projections solves the feasibility problem in convex programming. Converting 

a discrète optimization problem into a continuous optimization problem opens up the possibility of 

handling combinatorial problems of much higher dimensionality. 
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1. Introduction. As ail good mathematical scientists know, a broad community 
lias contributed to the invention of modem algorithms. Computer scientists, applied 
mathematicians, statisticians, economists, and physicists, to name just a few, hâve 
made lasting contributions. Exposing students to a variety of perspectives outside the 
realm of their own disciplines sharpens their instincts for modeling and arms them 
with invaluable tools. In this spirit, the current paper discusses techniques for solving 
Sudoku puzzles, one of the most popular pastimes in the world. One could makc the 
same points with more serious applications, but it is hard to imagine a more beguiling 
introduction to the algorithms featured hère. Sudoku diagrams are spécial cases of 
the Latin squares long familiar in expérimental design and, as such, enjoy interesting 
mathematical and statistical properties [11. The complicated constraints encountered 
in solving Sudoku puzzles hâve elicited many clever heuristics that amateurs use 

p^j to good effect. Hère we examine three generic methods with broader scientific and 

societal applications. The fact that one of thèse methods outperforms the other two 
is mostly irrelevant. No two problem catégories are completely alike, and it is best to 

^i try many techniques before declaring a winner. 

t-H The three algorithms tested hère are simulated annealing, alternating projections, 

and backtracking. Simulating annealing is perhaps the most familiar to statisticians. 
It is the optimization analog of MCMC (Markov chain Monte Carlo) and has been 
employed to solve a host of combinatorial problems. The method of alternating projec- 
tions was first proposed by von Neumann [19] to find a feasible point in the intersection 
of a family of hyperplanes. Modem versions of alternating projections more generally 
seek a point in the intersection of a family of closed convex sets. Backtracking is a 
standard technique taken from the toolkits of applied mathematics and computer sci- 
ence. Backtracking infallibly finds ail solutions of a Sudoku puzzle or détermines that 
no solution exists. Its Achilles heel of excessive computational complexity does not 
corne into play with Sudoku puzzles because they are, despite appearances, relatively 
benign computationally. Sudoku puzzles are instances of the satisfiability problem in 
computer science. As problem size increases, such problems are combinatorially hard 
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Fig. 1.1: Sample Puzzle 



and often defy backtracking. For this reason alone, it is useful to examine alternative 
stratégies. 

In a typical Sudoku puzzle, there are 81 cells arranged in a 9-by-9 grid, some 
of which are occupied by numerical clues. See Figure 1.1. The goal is to fill in the 
remaining cells subject to the following three rules: 

1. Each integer between 1 and 9 must appear exactly once in a row, 

2. Each integer between 1 and 9 must appear exactly once in a column, 

3. Each integer between 1 and 9 must appear exactly once in each of the 3-by-3 
subgrids. 

Solving a Sudoku game is a combinatorial task of intermediate complexity. The 
gênerai problem of filling in an incomplète n 2 x n 2 grid with n x n subgrids belongs to 
the class of NP-complete problems [21]. Thèse problems are conjectured to increase 
in computational complexity at an exponential rate in n. Nonetheless, a well planned 
exhaustive search can work quite well for a low value of n such as 9. For larger 
values of n, brute force, no matter how cleverly executed, is simply not an option. In 
contrast, simulated annealing and alternating projections may yield good approximate 
solutions and partially salvage the situation. 

In the rest of this paper, we describe the three methods for solving Sudoku puzzles 
and compare them on a battery of puzzles. The puzzles range in difficulty from pencil 
and paper exercises to hard benchmark tests that often defeat the two approximate 
methods. Our discussion réitérâtes the rationale for equipping students with the best 
computational tools. 

2. Three methods for solving Sudoku. 

2.1. Backtracking. Backtracking systematically grows a partial solution until 
it becomes a full solution or violâtes a constraint [18]. In the latter case it backtracks 
to the next permissible partial solution and begins the growing process anew. The 
advantage of backtracking is that a block of potential solutions can be discarded 
en masse. Backtracking starts by constructing for each empty Sudoku cell (i,j) a 
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■ • S17 7923 7929 



• • • S19 79232 79233 



Infeasible Infeasible 



Fig. 2.1: Backtracking on the puzzle shown in Figure 1.1. Starting from S74S95 = 79, 
the algorithm attempts and fails to grow the solution beyond S74S95S16S17S19 = 79232. 
After failing to grow the solution beyond S74S95S16S17S19 = 79233, ail partial solutions 
beginning with 7923 are eliminated from further considération. The algorithm starts 
anew by attempting to grow S74S95S16S17 = 7929. 



list Lij of compatible digits. This is done by scanning the cell's row, column, and 
subgrid. The empty cells are then ordered by the cardinalities of the lists |£y|. For 
example in Figure 1.1, two cells (7, 4) and (9, 5) possess lists L74 = {7} and L 95 = {9} 
with cardinality 1 and corne first. Next corne cells such as (1,6) with Lie — {2,3}, 
(1,7) with Ln = {3,9}, and (1,9) with L w — {2,3} whose lists hâve cardinality 2. 
Finally corne cells such as (2,9) with L29 = {2,3,5,6,7} whose lists hâve maximum 
cardinality 5. Partial solutions are character strings such as S74S95S16S17S19 taken in 
dictionary order with the alphabet at cell (i,j) limited to the list L^. In dictionary 
order a string such as 7939 is treated as coming after a string such as 79232. 

Backtracking starts with the string 7 by taking the only élément of L74, grows it 
to 79 by taking the only élément of L95, grows it to 792 by taking the first élément of 
Lie, grows it to 7923 by taking the first élément of L17, and finally grows it to 79232 
by taking the first élément of L19. At this juncture a row violation occurs, namely a 2 
in both cells (1,6) and (1,9). Backtracking discards ail strings beginning with 79232 
and moves on to the string 79233 by replacing the first élément of L19 by the second 
élément of Lig. This lcads to another row violation with a 3 in both cells (1, 7) and 
(1,9). Backtracking moves back to the string 7929 by discarding the fifth character 
of 79239 and replacing the first élément of L17 by its second élément. This sets the 
stage for another round of growing. 

Backtracking is also known as depth first search. In this setting the strings are 
viewed as nodes of a tree as depicted in Figure 2.1. Generating strings in dictionary 
order constitutes a tree traversai that systcmatically éliminâtes subtrees and moves 
down and backs up along branches. Because pruning large subtrees is more efficient 
than pruning small subtrees, ordering of cells by cardinality compels the décision 
tree to hâve fewer branches at the top. We use the C code from Skiena and Revilla 
[17] implementing backtracking on Sudoku puzzles. Backtracking has the virtue of 
finding ail solutions when multiple solutions exist. Thus, it provides a mechanism for 
validating the correctness of puzzles. 
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2.2. Simulated Annealing. Simulated annealing [2, 10, 16] attacks a combi- 
natorial optimization problem by defining a state space of possible solutions, a cost 
function quantifying departures from the solution idéal, and a positive température 
parameter. For a satisfiability problem, it is sensible to equate cost to the number of 
constraint violations. Solutions then correspond to states of zéro cost. Each step of 
annealing opérâtes by proposing a move to a new randomly chosen state. Proposais 
are Markovian in the sensé that they dépend only on the current state of the pro- 
cess, not on its past history. Proposed steps that decrease cost are always accepted. 
Proposed steps that increase cost are taken with high probability in the early stages 
of annealing when température is high and with low probability in the late stages of 
annealing when température is low. Inspired by models from statistical physics, sim- 
ulated annealing is designed to sample the state space broadly before settling down 
at a local minimum of the cost function. 

For the Sudoku problem, a state is a 9 x 9 matrix (board) of integers drawn 
from the set {1, . . . , 9}. Each integer appears nine times, and ail numerical dues are 
respected. Annealing starts from any feasible board. The proposai stage randomly 
sélects two différent cells without dues. The corresponding move swaps the contents 
of the cells, thus preserving ail digit counts. To ensure that the most troublesomc 
cells are more likely to be chosen for swapping, we sélect cells non-uniformly with 
probability proportional to exp(z) for a cell involved in i constraint violations. Lct B 
dénote a typical board, c(B) its associated cost, and n the current itération index. 
At température t, we décide whether to accept a proposed neighboring board B by 
drawing a random deviate U uniformly from [0, 1]. If U satisfies 

U < min {exp([c(B„) - c(B)] /r„), 1} , 

then we accept the proposed move and set B n+ i = B. Othcrwise, we reject the 
move and set B rl+1 = B„. Thus, the greater the increase in the number of constraint 
violations, the less likely the move is made to a proposed state. Also, the higher 
the température, the more likely a move is made to an unfavorable state. The final 
ingrédient of simulated annealing is the cooling schedule. In gênerai, the température 
parameter r starts high and slowly déclines to 0, where only favorable or cost neutral 
moves are taken. Typically température is lowered at a slow géométrie rate. 

2.3. Alternating Projections. The method of alternating projections relies 
on projection operators. In the projection problem, one seeks the closest point x in 
a set C C K d to a point y € R d . Distance is quantified by the usual Euclidean norm 
||x — y||. If y already lies in C, then the problem is trivially solved by setting x = y. It 
is well known that a unique minimizer exists whenever the set C is closed and convex 
[11]. We will dénote the projection operator taking y to x by -Pc (y) = x - 

Given a finite collection of closed convex sets with a nonempty intersection, the 
alternating projection algorithm finds a point in that intersection. Consider the case 
of two closed convex sets A and B. The method recursively générâtes a séquence y n 
by taking y = y and y n+1 = P,i(y n ) for n even and y„ +1 = Pb (y„) for n odd. 
Figure 2.2 illustrâtes a few itérations of the algorithm. As suggested by the picture, 
the algorithm does indeed converge to a point in A H B [3] . For more than two closed 
convex sets with nonempty intersection, the method of alternating projections cycles 
through the projections in some fixed order. Convergence occurs in this more gênerai 
case as well based on some simple theory involving paracontractive operators [7] . The 
limit is not guaranteed to be the closest point in the intersection to the original point 
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Fig. 2.2: Alternating projections nnd a point in AC\B, where A and B are closed convex 
sets. The initial point is y. The séquence of points y„ is generated by alternating 
projection onto A with projection onto B. 



y. The related but more complicated procédure known as Dykstra's algorithm [6] 
finds this point. 

It is casy to construct some basic projection operators. For instance, projection 
onto the rectangle R = {x <E M. d : Oj < Xi < bi for ail i} is achieved by defining 
x = -Pfl(y) to hâve componcnts Xi = min{max{aj, j/i}, bi}. This example illustrâtes 
a more gênerai rule; namely if A and B are two closed convex sets, then projection 
onto the Cartesian product A x B is effected by the Cartesian product operator 
(x, y) M- [P J 4(x),Ps(y)]. When A is an entire Euclidean space, -Pa(x) is just the 
identity map. Projection onto the hyperplane 



H = {ye 



v T y 



0} 



is implemented by the operator 

Ptf(x)=X 



Projection onto the unit simplcx U = < x <E R d : ^2 i=1 Xi = 1, Xi > Vi > is more 
subtlc. Fortunately there exist fast algorithms for this purpose [5, 13]. 

In either the alternating projection algorithm or Dykstra's algorithm, it is advan- 
tageous to reduce the number of participating convex sets to the minimum possible 
consistent with fast projection. For instance, it is better to take the unit simplex 
U as a whole rather than as an intersection of the halfspaces {x : Xi > 0} and the 
affine subspace {x : 2i=i x i = •"•}• Because our alternating projection algorithm for 
solving Sudoku puzzles relies on projecting onto several simplexes, it is instructive 
to dérive the Duchi et al [5] projection algorithm. Consider minimization of a con- 
vex smooth fonction /(x) over U. The Karush-Kuhn-Tucker stationarity condition 
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involves sctting the gradient of the Lagrangian 

d 



a a 

£(x, A, /*) = /(x) + *(£>< -!)-£), 



cqual to 0. This is stated in components as the Gibbs criterion 

= ^-/(x) + A - m» 

for multipliera (Xi > obeying the complementary slackness conditions fiiXi — 0. For 
the choice /(x) = ^||x — y|| 2 , the Gibbs condition can be solved in the form 

\y t -X Xi > 

[y t - \ + im Xi = 0. 

If we let 1^ — {i : Xi > 0}, then the equality constraint 

1 = 5Z Xi = ^Vi~ l J +l A 

iel+ i£l + 

implies 



E» 



ieij. 



The catch, of course, is that we do not know I + . 



The key to avoid searching over ail 2 d subsets is the simple observation that the 
Xi and yi are consistently ordered. Suppose on the contrary that yi < yj and Xj < x^. 
For small s > substitute Xj + s for Xj and Xi — s for Xi. The objective function 
/(x) = |||x — y|| 2 then changes by the amount 



(xt -s- yi) 2 + (xj + s - yj) 2 - (Xi - yi) 2 - (xj - yj) 2 = a(yt ~ yj + Xj - Xi) + s 



which is négative for s small. Let Wi dénote the ith largest entry of y. Then the 
Gibbs condition implies that w± > W2 > ■ ■ ■ > w \i + \ > A. Thus, to détermine À we 
seek the largest k such that 




and set A equal to the right hand side of this inequality. With A in hand, the Gibbs 
condition implies that Xi — max{y.j — A,0}. It follows that projection onto U can 
be accomplished in O(dlogd) opérations dominated by sorting. Algorithm 1 displays 
pseudocode for projection onto U. 

Armed with thèse results, we now describe how to solve a continuous relaxation 
of Sudoku by the method of alternating projections. In the relaxed version of the 
problem, we imagine generating candidate solutions by random sampling. Each cell 
(i,j) is assigned a sampling distribution pijk = Pr(SV,- = k) for choosing a random 
deviate Sij G {1, . . . ,9} to populate the cell. If a numerical clue k occupies cell (i,j), 
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Algorithm 1 Projection onto simplex 



W «- SORT_DESCENDING(y). 

k <- max |j : w-, > y ^Li w i) } 

a;, <— max{j/j — A, 0}. 



then we set piji = 1 for l = k and otherwise. A matrix of sampled déviâtes S 
constitutes a candidate solution. It seems reasonable to demand that the average 
puzzle obey the constraints. Once we find a feasible 3-dimensional tensor 5 = {pijk) 
obeying the constraints, a good heuristic for generating an integer solution S is to put 

Sjî ^ max j)iik' 
J ke{i,...M 

In other words, we impute the most probable integer to each unknown cell (i,j)- It 
is easy to construct counterexamples where imputation of the most probable integer 
from a feasible tensor T of the relaxed problem fails to solve the Sudoku puzzle. 

In any case, the remaining agenda is to specify the constraints and the corre- 
sponding projection operators. The requirement that each digit appear in each row 
on average once amounts to the constraint X) 7 =i Pijk = 1 f° r & H i an d k between 1 and 
9. There are 81 such constraints. The requirement that each digit appear in each col- 
umn on average once amounts to the constraint X^<=i Pijk = 1 f° r a ll j an d k between 
1 and 9. Again, there are 81 such constraints. The requirement that each digit appear 
in each subgrid on average once amounts to the constraint Ylj=i 12j=iPa+i,b+j,k = 1 
for all k between 1 and 9 and all a and b chosen from the set {0, 3, 6}. This con- 
tributes another 81 constraints. Finally, the probability constraints ^2k—iPijk = 1 for 
all i and j between 1 and 9 contribute 81 more affine constraints. Hence, there are a 
total of 324 affine constraints on the 9 3 = 729 parameters. In addition there are 729 
nonnegativity constraints pijf. > 0. 

Every numerical clue voids several constraints. For example, if the digit 7 is 
mandated for cell (9,2), then we must take p 92 7 = 1, p^k = for k ^ 7, Pi27 = for 
all i ^= 9, pgjr = for all j '^ 2, and pijj — for all other pairs (i, j) in the (3,1) subgrid. 
In carrying out alternating projection, we eliminate the corresponding variables. With 
this proviso, we cycle through the simplex projections summarized in Algorithm 1. 
The process is very efficient but slightly tedious to code. For the sake of brevity 
we omit the remaining détails. All code used to generate the subséquent results are 
available at https://github.com/echi/Sudoku, and we direct the interested reader 
there. 

3. Comparisons. We generated test puzzles from code available online [20] and 
discarded puzzles that could be completely solved by filling in entries directly implied 
by the initial dues. This left 87 easy puzzles, 130 médium puzzles, and 100 hard 
puzzles. We also downloaded an additional 95 very hard benchmark puzzles [4]. In 
simulated annealing, the température r was initialized to 200 and lowered by a factor 
of 0.99 after every 50 steps. We allowed at most 2 x 10 5 itérations and reset the 
température to 200 if a solution had not been found after 10 5 itérations. For the 
alternating projection algorithm, we commenced projecting from the origin 0. 

Backtracking successfully solved all puzzles. Table 3.1 shows the fraction of puz- 
zles the two heuristics were able to successfully complète. Table 3.2 records summary 
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Alt. Projection Sim. Anncaling Backtracking Number of Puzzles 



Easy 
Médium 
Hard 
Top 95 



0.85 
0.89 
0.72 
0.41 



1.00 
1.00 
0.97 
0.03 



1.00 
1.00 
1.00 
1.00 



87 
130 
100 
95 



Table 3.1: Success rates for solving puzzles of varying difficulty. 



Easy 



Médium 



Hard 



CPU Time (sec) 
Alt. Projection Sim. Annealing Backtracking 



inimum 0.032 


edian 0.041 


ean 0.052 


aximum 0.237 


inimum 0.032 


edian 0.051 


ean 0.062 


aximum 0.269 


inimum 0.033 


edian 0.110 


ean 0.159 


aximum 0.525 



0.006 
0.021 
0.112 
0.970 

0.007 
0.037 
0.231 
3.36 

0.008 
0.753 
1.104 
7.204 



0.007 
0.008 
0.008 
0.009 

0.007 
0.008 
0.008 
0.010 

0.008 
0.008 
0.009 
0.031 



Table 3.2: Summary statistics on the run times for différent methods on puzzles of 
varying difficulty. For the alternating projection and simulated annealing techniques, 
only successfully solved puzzles are included in the statistics. 



statistics for the CPU time taken by each method for each puzzle category. Ail com- 
putations were done on an iMac computer with a 3.4 GHz Intel Core i7 processor and 
8 GB of RAM. We implemented the alternating projection and simulated annealing 
algorithms in Fortran 95. For backtracking we relied on the existing implementation 
inC. 

The comparisons show that backtracking performs best, and for the vast majority 
of 9 x 9 Sudoku problems it is probably going to be hard to beat. Simulated annealing 
finds the solution except for a handful of the most challenging paper and pencil 
problems, but its maximum run times are unimpressive. While alternating projection 
does not perform as well on the pencil and paper problems compared to the other 
two algorithms, it does not do terribly either. Moreover, we see hints of the tables 
turning on the hard puzzles. 

Simulated annealing strugglcs mightily on the 95 benchmark puzzles. Closer 
inspection of individual puzzles reveals that thèse very hard puzzles admit many local 
minima with just a few constraint violations. Figure 3.1 shows a typical local minimum 
that traps the simulated annealing algorithm. Additionally, something curious is 
happening in Figure 3.2, which plots CPU solution times for alternating projection 
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Fig. 3.1: A typical local minimum that traps simulated annealing in a top 95 puzzle. 
Clues are shaded light gray. There are two column constraint violations caused by 
the cells shaded dark gray. The local minimum is deep in the sensé that ail one-step 
swaps resuit in further constraint violations. 



versus backtracking. Points below the dashed line indicate puzzles that the method of 
altcrnating projection solves more efficiently than backtracking. It appears that when 
the method of alternating projections finds correct solutions to very hard problems, 
it tends to find them more quickly than backtracking. 

4. Discussion. It goes almost without saying that students of the mathematical 
sciences should be taught a variety of solution techniques for combinatorial problems. 
Because Sudoku puzzles are easy to state and culturally neutral, they furnish a good 
starting point for the educational task ahead. It is important to stress the contrast be- 
tween exact stratégies that scale poorly with problcm sizc and approximatc stratégies 
that adapt more gracefully. The performance of the alternating projection algorithm 
on the benchmark tests suggest it may hâve a rôle in solving much harder combina- 
torial problems. Certainly, the electrical engineering community takes this attitude, 
given the close kinship of Sudoku puzzles to problems in coding theory [8, 9, 14, 15]. 

One can argue that algorithm development has assumed a dominant rôle within 
the mathematical sciences. Three inter-related trends are feeding this phenomenon. 
First, Computing power continues to grow. Execution times are dropping, and com- 
puter memory is getting cheaper. Second, good Computing simply tempts scientists 
to tackle larger data sets. Third, certain fields, notably communications, imaging, 
genomics, and économies generate enormous amounts of data. AU of thèse fields cre- 
ate problems in combinatorial optimization. For instance, modem DNA sequencing is 
still challcnged by the phase problem of discerning the maternai and paternal origin 
of genetic variants. Computation is also being adopted more often as a means of 
proving propositions. The claim that at least 17 numerical clues are needed to ensure 
uniqueness of a Sudoku solution has apparently been proved using intelligent brute 
force [12]. Mathematical scientists need to be aware of computational developments 
outside their classical application silos. Importing algorithms from outside fields is 
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Fig. 3.2: Scatterplot of solution times for the top 95 benchmark problems. 



onc of thc quickest means of refreshing an existing field. 
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